Sparsity in machine learning: theory and practice
نویسنده
چکیده
The thesis explores sparse machine learning algorithms for supervised (classification and re gression) and unsupervised (subspace methods) learning. For classification, we review the set covering machine (SCM) and propose new algorithms th a t directly minimise the SCMs sample compression generalisation error bounds during the training phase. Two of the resulting algo rithm s are proved to produce optim al or near-optimal solutions with respect to the loss bounds they minimise. One of the SCM loss bounds is shown to be incorrect and a corrected derivation of the sample compression bound is given along with a framework for allowing asymmetrical loss in sample compression risk bounds. In regression, we analyse the kernel matching pursuit (KMP) algorithm and derive a loss bound th a t takes into account the dual sparse basis vectors. We make connections to a sparse kernel principal components analysis (sparse KPCA) algorithm and bound its future loss using a sample compression argument. This investigation suggests a similar argument for kernel canonical correlation analysis (KCCA) and so the application of a similar sparsity algorithm gives rise to the sparse KCCA algorithm. We also propose a loss bound for sparse KCCA using the novel technique developed for KMP. All of the algorithms and bounds proposed in the thesis are elucidated with experiments.
منابع مشابه
Active Learning: An Approach for Reducing Theory-Practice Gap in Clinical Education
Introduction: The gap between theory and practice in clinical fields, including nursing, is one of the main problems that many solutions have been suggested to eliminate it. In this article, we have tried to investigate its solution through active learning. Methods: In this review article, searching articles published during 2000-2012 was done through library references, scientific databases. ...
متن کاملMatrix Computations & Scientific Computing Seminar
Extracting useful information from high-dimensional data is the focus of today’s statistical research and practice. After broad success of statistical machine learning on prediction through regularization, interpretability is gaining attention and sparsity has been used as its proxy. With the virtues of both regularization and sparsity, Lasso (L1 penalized L2 minimization) and its extensions ha...
متن کاملA NOVEL FUZZY-BASED SIMILARITY MEASURE FOR COLLABORATIVE FILTERING TO ALLEVIATE THE SPARSITY PROBLEM
Memory-based collaborative filtering is the most popular approach to build recommender systems. Despite its success in many applications, it still suffers from several major limitations, including data sparsity. Sparse data affect the quality of the user similarity measurement and consequently the quality of the recommender system. In this paper, we propose a novel user similarity measure based...
متن کاملمقایسه تاثیر روشهای تدریس تئوری-عملی با عملی- تئوری درس آناتومی بر میزان یادگیری و رضایتمندی دانشجویان
Abstract Background: Educational systems need to modify teaching methods in order to be effective. This research was conducted to study the effects of theory-practice and practice-theory methods of anatomy teaching on student learning and satisfaction. Methods: This quasi-experimental survey was carried out on second semester students of Lorestan University. During a 6-week period student...
متن کاملSparse Online Learning via Truncated Gradient
We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to total sparsification. Second, the approach is theoretically motivated, and an instance of it can ...
متن کاملFour Encounters with System Identification
Model-based engineering becomes more and more important in industrial practice. System identification is a vital technology for producing the necessary models, and has been an active area of research and applications in the automatic control community during half a century. At the same time, increasing demands require the area to constantly develop and sharpen its tools. This paper deals with h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008